Goto

Collaborating Authors

 ai inference


Realizing value with AI inference at scale and in production

MIT Technology Review

Training an AI model to predict equipment failures is an engineering achievement. But it's not until prediction meets action--the moment that model successfully flags a malfunctioning machine--that true business transformation occurs. One technical milestone lives in a proof-of-concept deck; the other meaningfully contributes to the bottom line. Craig Partridge, senior director worldwide of Digital Next Advisory at HPE, believes the true value of AI lies in inference". Inference is where AI earns its keep. It's the operational layer that puts all that training to use in real-world workflows.


Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

Mueller, Lion, Garcia-Ortiz, Alberto, Najafi, Ardalan, Fuks, Adam, Bamberg, Lennart

arXiv.org Artificial Intelligence

Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.


Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications

Alshaer, Samer, Khalifeh, Ala, Obermaisser, Roman

arXiv.org Artificial Intelligence

Metascheduling in time-triggered architectures has been crucial in adapting to dynamic and unpredictable environments, ensuring the reliability and efficiency of task execution. However, traditional approaches face significant challenges when training Artificial Intelligence (AI) scheduling inferences offline, particularly due to the complexities involved in constructing a comprehensive Multi-Schedule Graph (MSG) that accounts for all possible scenarios. The process of generating an MSG that captures the vast probability space, especially when considering context events like hardware failures, slack variations, or mode changes, is resource-intensive and often infeasible. To address these challenges, we propose an adaptive online learning unit integrated within the metascheduler to enhance performance in real-time. The primary motivation for developing this unit stems from the limitations of offline training, where the MSG created is inherently a subset of the complete space, focusing only on the most probable and critical context events. In the online mode, Reinforcement Learning (RL) plays a pivotal role by continuously exploring and discovering new scheduling solutions, thus expanding the MSG and enhancing system performance over time. This dynamic adaptation allows the system to handle unexpected events and complex scheduling scenarios more effectively. Several RL models were implemented within the online learning unit, each designed to address specific challenges in scheduling. These models not only facilitate the discovery of new solutions but also optimize existing schedulers, particularly when stricter deadlines or new performance criteria are introduced. By continuously refining the AI inferences through real-time training, the system remains flexible and capable of meeting evolving demands, thus ensuring robustness and efficiency in large-scale, safety-critical environments.


ISLE: An Intelligent Streaming Framework for High-Throughput AI Inference in Medical Imaging

Kulkarni, Pranav, Garin, Sean, Kanhere, Adway, Siegel, Eliot, Yi, Paul H., Parekh, Vishwa S.

arXiv.org Artificial Intelligence

As the adoption of Artificial Intelligence (AI) systems within the clinical environment grows, limitations in bandwidth and compute can create communication bottlenecks when streaming imaging data, leading to delays in patient care and increased cost. As such, healthcare providers and AI vendors will require greater computational infrastructure, therefore dramatically increasing costs. To that end, we developed ISLE, an intelligent streaming framework for high-throughput, compute- and bandwidth- optimized, and cost effective AI inference for clinical decision making at scale. In our experiments, ISLE on average reduced data transmission by 98.02% and decoding time by 98.09%, while increasing throughput by 2,730%. We show that ISLE results in faster turnaround times, and reduced overall cost of data, transmission, and compute, without negatively impacting clinical decision making using AI systems.


Open RAN platforms to support far edge AI inference

#artificialintelligence

A key benefit of using general-purpose processors to implement open RAN/vRAN is that the same platforms can be used to support AI inference and other applications at the far edge of the network, such as cell site routers (CSRs) and content delivery and hosting. These edge platforms can be used to host virtualized applications closer to the user, offering significant benefits in terms of lower latency and shared infrastructure. To find out more about which applications service providers plan to support on shared far edge solutions and how they plan to deploy open RAN and vRAN platforms and architectures for 5G networks, Heavy Reading ran an exclusive survey of individuals working for operators with mobile network businesses. The results are presented in an analyst report, Open RAN Platforms and Architectures Operator Survey Report, that can be downloaded for free here. The survey presented options for five edge applications that can share server platforms with virtualized open RAN baseband implementations.


The Odious Comparisons Of GPU Inference Performance And Value

#artificialintelligence

While AI training dims the lights at hyperscalers and cloud builders and costs billions of dollars a year, in the long run, there will be a whole lot more aggregate processing done on AI inference than on AI training. It might be a factor of 2X to 3X compute capacity higher soon, and anywhere from 10X to 100X higher capacity within a decade. What we all do suspect, however, is that there will be relatively few heavy duty AI training devices and platforms that use them and myriad and numerous AI inference devices. And so the relative performance and price/performance of compute engines that run inference are going to be important as they are deployed at scale. Meta Platforms helped invent many of the machine learning techniques and technologies that are being deployed in production these days, and it is was no surprise to us that the company had created a unified inference framework, called AITemplate, which it open sourced and described earlier this month in an MetaAI engineering blog post.


FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy

Ravi, Nikil, Chaturvedi, Pranshu, Huerta, E. A., Liu, Zhengchun, Chard, Ryan, Scourtas, Aristana, Schmidt, K. J., Chard, Kyle, Blaiszik, Ben, Foster, Ian

arXiv.org Artificial Intelligence

A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.


How AI is reshaping the edge computing landscape

#artificialintelligence

How much computing power is needed at the edge? How much memory and storage are enough for AI at the edge? Minimum requirements are growing as AI opens the door to innovative applications that need more and faster processing, storage, and memory. How can today's memory and storage technologies meet the stringent requirements of these challenging new edge applications? Edge includes any distributed application where specific processing occurs away from the server, even if the data is eventually sent to a data center.


Seeking AI resources for students in your university classroom?

#artificialintelligence

It's no secret that artificial intelligence (AI) is one of the hottest topics in the tech world today. Every day, it seems like there's a new story about how AI is being used to improve some aspect of our lives, from personal assistants to driverless cars. Given all the hype, it's no wonder that educators are eager to introduce AI concepts to their students. Now, thanks to resources inside Intel's 5-module teaching kit for AI inference teaching the Intel Distribution of OpenVINO toolkit, it is easier than ever to introduce the concepts of deep learning AI to students. Get your students hands-on coding experience with this teacher kit, which comes with a lesson plan, 5-modules of workbooks, videos, quizzes, and Jupyter* Notebook coding lab tutorials.


What Nvidia's new MLPerf AI benchmark results really mean

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Nvidia released results today against new MLPerf industry-standard artificial intelligence (AI) benchmarks for its AI-targeted processors. While the results looked impressive, it is important to note that some of the comparisons they make with other systems are really not apples-to-apples. For instance, the Qualcomm systems are running at a much smaller power footprint than the H100, and are targeted at market segments similar to the A100, where the test comparisons are much more equitable.